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Abstract 

Pursuit-Evasion Games (in discrete time) are stochastic games 
with nonnegative daily payoffs, with the final payoff being the cu- 
mulative sum of payoffs during the game. We show that such games 
admit a value even in the presence of incomplete information and that 
this value is uniform, i.e. there are e-optimal strategies for both play- 
ers that are e-optimal in any long enough prefix of the game. We give 
an example to demonstrate that nonnegativity is essential and expand 
the results to Leavable Games. 
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1 Introduction 



Games of Pursuit and Evasion are two-player zero-sum games involving a 
Pursuer (P) and an Evader (E). P's goal is to capture E, and the game 
consist of the space of possible locations and the allowed motions for P and 
E. These games are usually encountered within the domain of differential 
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games, i.e., the location space and the allowed motions have the cardinality 
of the continuum and they tend to be of differentiable or at least continuous 
nature. 

The subject of Differential Games in general, and Pursuit- Evasion Games 
in particular, was pioneered in the 50s by Isaacs (1965). These games evolved 
from the need to solve military problems such as airfights, as opposed to clas- 
sical game theory which was oriented toward solving economical problems. 
The basic approach was akin to differential equations techniques and op- 
timal control, rather than standard game theoretic tools. The underlying 
assumption was that of complete information, and optimal pure strategies 
were searched for. Conditions were given, under which a pure strategies 
saddle point exists (see, for example, Varaiya and Lin (1969)). Usually the 
solution was given together with a value function, which assigned each state 
of the game its value. Complete information was an essential requirement in 
this case. For a thorough introduction to Pursuit-Evasion and Differential 
Games see Basar and Olsder (1999). 

A complete-information continuous-time game "intuitively" shares some 
relevant features with perfect-information discrete-time games. The latter 
are games with complete knowledge of past actions and without simultane- 
ous actions. Indeed, if one player decides to randomly choose between two 
pure strategies which differ from time to and on, his opponent will discover 
this "immediately" after to, thus enabling himself to respond optimally al- 
most instantly. Assuming the payoff is continuous, the small amount of time 
needed to discover the strategy chosen by the opponent should affect the 
payoff negligibly. A well-known result of Martin (1975, 1985) implies that 
every perfect-information discrete-time game has e-optimal pure strategies 
(assuming a Borel payoff function) and so should, in a sense, continuous 
time games. 

Another reason to restrict oneself to pure strategies is that unlike discrete- 
time games, there is no good formal framework for continuous-time games. 
By framework we mean a way to properly define the space of pure strategies 
and the measurable a-algebra on them. There are some approaches but none 
is as general or complete as for discrete-time games. This kind of framework 
is essential when dealing with a general incomplete information setting. 
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This paper will therefore deal with discrete-time Pursuit-Evasion Games. 
We hope that our result will be applied in the future to discrete approxima- 
tions of continuous-time games. Pursuit-Evasion Games in discrete time are 
formalized and discussed in Kumar and Shiau (1981). 

Pursuit-Evasion Games are generally divided into two categories: Games 
of Kind and Games of Degree. Games of Kind deal with the question of 
capturability: whether a capture can be achieved by the Pursuer or not. 
In a complete-information setting this is a yes-or-no question, completely 
decided by the rules of the game and the starting positions. With incomplete 
information incorporated, we simply assign a payoff of 1 for the event of 
capture and payoff otherwise. Games of Degree have the Pursuer try to 
minimize a certain payoff function such as the time needed for capture. The 
question of capturability is encountered here only indirectly: if the Evader 
have a chance of escaping capture indefinitely, the expected time of capture 
is infinity. The payoff, in general, can be any function, such as the minimal 
distance between the Evader and some target set. 

What unites the two categories is that the payoff function in both is 
positive and cumulative. The maximizing player, be it the Pursuer or the 
Evader, gains his payoff and never loses anything. This is in contrast with 
other classes of infinitely repeated games, such as undiscounted stochastic 
games, where the payoff is the limit of the averages of daily payoffs. 

Discrete-time stochastic games were introduced by Shapley (1953) who 
proved the existence of the discounted value in two-player zero-sum games 
with finite state and action sets. Recursive games were introduced by Everett 
(1957). These are stochastic games, in which the payoff is except for 
absorbing states, when the game terminates. Thus, absorbing states are as 
happens in Pursuit-Evasion Games, where the payoff is obtained only when 
the game terminates. The game is said to have a uniform value if e-optimal 
strategies exist that are also e-optimal in any long enough prefix of the game. 
Everett proved the existence of the uniform value for two-player, zero-sum 
recursive games. 

We shall now formally define Pursuit-Evasion Games to be two-player 
zero-sum games with cumulative and positive payoffs. To avoid confusion, the 
players will be called the Maximizer and the Minimizer, and their respective 
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goals should be obvious. 

Our main result is the existence of uniform value for Pursuit-Evasion 
Games with incomplete-information and finite action and signal sets, fol- 
lowed by a generalization for arbitrary signal sets. In section 4 we present a 
different class of games to which our proof also applies. In section 5 we show 
that the positiveness requirement is indispensable by giving an appropriate 
counterexample. 

2 Definitions and the main Theorem 

A cumulative game with complete information is given by: 

• Two finite sets A 1 and A 2 of actions. 

Define H n = (A 1 x A 2 ) n to be the set of all histories of length n, and 
H = U™ ={) H n to be the set of all finite histories. 

• A daily payoff function / : H — > M. 

Let H = (A 1 x A 2 ) H ° be the set of all infinite histories. The daily payoff 
function induces a payoff function p : H — > R by p{h) = Y^=o where 
h n is the length n prefix of h. In the sequel we will only study the case in 
which / is nonnegative, so that p is well defined (though it may be infinite). 

The game is played in stages as follows. The initial history is h = 0. 
At each stage n > both players choose simultaneously and independently 
actions a E A and b E B, and each player is informed of the other's choice. 
The new game history is h n+ i = h n ^< a,b >, i.e., the concatenation of 
< a, b > to the current history. The infinite history of the game, h, is the 
concatenation of all pairs of actions chosen throughout the game. The payoff 
is p(h), the goal of the Maximizer is to maximize the expectation of p(h), 
and that of the Minimizer is to minimize it. 

If all the values of / are nonnegative, we call the game nonnegative. 
A complete information Pursuit-Evasion Game is a nonnegative cumulative 
game. 

As cumulative games are a proper superset of recursive games (see Ev- 
erett (1957)), Pursuit-Evasion Games are a proper superset of nonnegative 
recursive games. 
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As is standard in game theory, the term "complete information" is used 
to denote a game with complete knowledge of the history of the game, and 
not the lack of simultaneous actions (which is termed "perfect information"). 

A cumulative game with incomplete information is given by: 

• Two finite sets A 1 and A 2 of actions. 
Define H n and H as before. 

• A daily payoff function / : H — > 1R. 

• Two measure spaces S 1 and S* 2 of signals. 

• \/h G H two probability distributions p\ G A(S' 1 ) and pi G A(S' 2 ). 

Define H and p as before. In particular, the signals are not a parameter 
of the payoff function. 

An incomplete-information cumulative game is played like a complete 
information cumulative game, except that the players are not informed of 
each other's actions. Instead, a signal pair < s 1 , s 2 >E S 1 x S 2 is randomly 
chosen with distribution p\xp\, h being the current history of the game, with 
player i observing s l . An incomplete-information Pursuit-Evasion Game is 
an incomplete-information nonnegative cumulative game. 

Define H l n to be {A 1 x S l ) n . This is the set of private histories of length 
n of player i. Similarly, define H l = U^L iJ^, the set of all private finite 
histories, and H l = (A 1 x S l ) H ° the set of all private infinite histories. 

In a complete-information cumulative game a behavioral strategy for 
player % is a function o % : H — > A (A*). In an incomplete-information cumula- 
tive game a (behavioral) strategy for player i is a function o % : H l — > A(A l ). 
Recall that by Kuhn's Theorem (Kuhn (1953)) the set of all behavioral strate- 
gies coincides with the set of all mixed strategies, which are probability dis- 
tributions over pure strategies. 

Denote the space of all behavioral strategies for player % by f2\ A profile 
is a pair of strategies, one for each player. A profile < a 1 , a 2 >, together 
with {pi}, induces, in the obvious manner, a probability measure fia 1 ,^ over 
H equipped with the product cr-algebra. 

The value of a strategy a 1 for the Maximizer is val{a x ) = inf cr 2 g ^2 1 ^ (p(h)). 
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The value of a strategy a 2 for the Minimizer is val(o~ 2 ) = sup CT i gf7 i 1 2 (p(h)). 

When several games are discussed we will explicitly denote the value in 
game G by vale- 

The lower value of the game is val(G) = sup CT i gn i valuer 1 ). 

The upper value of the game is val(G) = inf^^a v al(a 2 ). 

If val(G) = val(G), the common value is the value of the game val(G) = 
vcd(G) = val(G). Observe that vcd(G) and val(G) always exist, and that 
val(G) < val(G) always holds. 

A strategy a 1 of player i is e-optimal if \val(a l ) — val(G)\ < e. A strategy 
is optimal if it is 0-optimal. 

A cumulative game is bounded if its payoff function p is bounded, i.e. 
3B G RVh G H - B < p(h) < B. 

Let G =< A l ,A 2 ,f > be a cumulative game. Define /„ to be equal to 
/ for all histories of length up to n and zero for all other histories. Define 
G n =< A 1 ,A 2 ,f n >. Thus, G n is the restriction of G to the first n stages. 
Let p n be the payoff function induced by f n . 

A game G is said to have a uniform value if it has a value and for each 
e > there exist iV and two strategies a 1 , a 2 for the two players that are 
e-optimal for every game G n with n > N. 

The first main result is: 

Theorem 1 Every bounded Pursuit-Evasion Game with incomplete-information 
and finite signal sets has a uniform value. Furthermore, an optimal strategy 
exists for the Minimizer. 

Proof. Let G be a bounded Pursuit- Evasion Game with incomplete- 
information . Let G n be defined as above. Since A 1 , A 2 , S 1 , S 2 are all finite, 
there are only a finite number of private histories of length up to n. G n is 
equivalent to a finite-stage finite-action game, and therefore it has a value 
v n . From the definition of G n and since / is nonnegative 

\JheH Pn (h) < Pn+1 (h) < P (h) 

which implies that for all a 1 G fi 1 

val Gn {a l ) < vala^a 1 ) < val^a 1 ) (1) 
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so that 

val(G n ) < val(G n+ i) < val(G). 

Therefore, v n is a nondecreasing bounded sequence and val{G) is at least 
v = lim^oofn. 

On the other hand, define K n = {a 2 G VL 2 \ valc„(cr 2 ) < v}. Since 
val{G n ) = v n < v, K n cannot be empty. 

K n is a compact set, since the function valG n {c 2 ) is continuous over Q 2 , 
which is compact, and K n is the preimage of the closed set (— oo,v}. 

For all cr 2 G Q 2 valG n (cr 2 ) < valc n+1 (cr 2 ), so that K n D K n+ i. Since the 
sets K n are compact, their intersection is nonempty. 

Let a 2 be a strategy for the Minimizer in r\^ =Q K n . Let a 1 be any strategy 
for the Maximizer. From p{h) = lim^oo p n {h) and since p is bounded, we 
get by the monotone convergence Theorem 

Since a 2 belongs to K n , a2 {p n {h)) < ^ an d therefore a2 (p(h)) < v. 
Since a 1 is arbitrary val(a 2 ) < v, so that val(G) < v. Consequentially, v is 
the value of G. 

Notice that any cr 2 G n^L Q K n has valc(cr 2 ) = v and is therefore an 
optimal strategy for the Minimizer. 

Given e > choose iV such that > v — e. Let a 1 be an optimal strategy 
for the Maximizer in Gjv, and let cr 2 G fl^i i^ n . By flTJ 

Vn > N v n -e<v-e<v N = fa/ Gjv (a 1 ) < valc^o 1 ) 

so that a 1 is e-optimal in G n . As a 2 G -ft'n one has t> alG n (cr 2 ) < v < v n + e so 
that a 2 is e-optimal in G n . 

These strategies are e-optimal in all games G n for n > N. Thus, the value 
is uniform. ■ 

Remark: Most of the assumption on the game G are irrelevant for the 
proof of the theorem and were given only for the simplicity of description. 

1. The action sets A 1 and the signal sets S l may depend respectively on 
the private histories H l n . 
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2. The signals < s^s 2 > may be correlated, i.e. chosen from a common 
distribution p h e A(S l x S 2 ). 

3. The game can be made stochastic simply by adding a third player, Na- 
ture, with a known behavioral strategy. The action set for Nature can 
be countable, since it could always be approximated by large enough 
finite sets. The action sets for the Maximizer can be infinite as long as 
the signals set S 2 is still finite (so the number of pure strategies for the 
Minimizer in G n is still finite). 

4. Since the bound on payoffs was only used to bound the values of G n , 
one can drop the boundedness assumption, as long as the sequence {v n } 
is bounded. If they are unbounded then G has infinite uniform value in 
the sense that the Maximizer can achieve as high a payoff as he desires. 

3 Arbitrary signal sets 

Obviously, the result still hold if we replace the signal set S by a sequence 
of signal sets S n , all of which are finite, such that the signals for histories 
of length n belong to S n . The signal sets, like the action sets can change 
according to past actions, but since there are only finitely many possible 
histories of length n, this is purely semantical. 

What about signals chosen from an infinite set? If the set S is countable 
than we can approximate it with finite sets S n , chosen such that for any 
history h of length n the chance we get a signal outside S n is negligible. 
We won't go into details because the next argument applies for both the 
countable and the uncountable cases. 

A cumulative game G is e- approximated by a game G' if G' has the same 
strategy spaces as G and for any pair of strategies a, r 

\Pg(<t,t) -p G '(o-,r)\ < e. 

Lemma 2 If G is a bounded Pursuit-Evasion Game with incomplete infor- 
mation then G can be e- approximated by a Pursuit Evasion Game with in- 
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complete information with the same action sets and payoffs which can be 
simulated using a sequence of finite signal sets. 

Proof. Let G be such a game. Assume, w.l.o.g., that the payoff function p 
is bounded by 1. Fix a positive e. Let e n = e/2 n . Define p l n = ^2 heH p l h /\H n \, 
the mean distribution of the signals at stage n. Every distribution p l h of time 
n is absolutely continuous with respect to p l n . By Radon- Nykodim theorem, 
a density function f l h exists such that p l h (E) = j E fhdp l n - Clearly, f l h is 
essentially bounded by \H n \. 

Let S': be {0,e n ,2e n ,3e„,..., [\H n \/e n \e n }\ H -\. For h e H n define /* to 
be fl rounded down to the nearest multiple of e n . Define F£ : S l — > S£ by 
Fn( s ) = {f'h( s )}heH n - Let G' be the same game as G except that the players 
observe the signals F^(s % ) G S% where s l is the original signal with density 

fh- 

Given a signal s' 1 in S'^ one can project it back onto by choosing 
from a uniform distribution (with respect to the measure p l n ) over the set 
E(s H ) = F^~ 1 (s' 1 ). Let G" be the game G except that the signals are chosen 
with the distribution just described. Denote their density function by f^. 
This game can be simulated using only the signals in G' and vice versa so 
they are equivalent. 

G and G" have exactly the same strategy spaces. The only difference is 
a different distribution of the signals. But the way the signals in G" were 
constructed it is obvious that the density function f'^ do not differ from f % h 
by more than e n for any history h of length n. Given a profile < a 1 , a 2 > 
denote the generated distributions on H in G and G" by p and p" . The 
payoffs are pg{o~ 1 ,o~ 2 ) = J pdp and p G n(a 1 , a 2 ) = J pdp" . But the distance, 
in total variation metric, between p and p" cannot be more than the sum 
of distances between the distributions of signals at each stage, which is no 
more than e « = e - By definition of total variation metric, the difference 
between J pdp and J pdp" cannot be more than e. ■ 

Theorem 3 If G is as in lemma and have bounded nonnegative payoffs, it 
has a uniform value. 

Proof. Let G be such a game, and for any e let G € be an e-approximation 
of G produced by the lemma. G t is equivalent to a game with finite signal 
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sets and therefore has a value according to Theorem 1, denoted v e . It is 
immediate from the definition of e-approximation that v, the lower value of 
G cannot be less than v e — e, and likewise v is no more than v e + e. v — v is 
therefore less than 2e. But e was chosen arbitrarily, so that v — v. 

Given e > let a 1 and a 2 be e/2-optimal strategies in G e / 2 that are also 
e/2-optimal in any prefix of G e / 2 longer than N. Clearly these strategies are 
e-optimal in any G n with n > N. Thus, the value is uniform. ■ 

4 Leavable games 

Leavable games are cumulative games in which one of the players, say the 
Maximizer, but not his opponent is allowed to leave the game at any stage. 
The obvious way to model this class of games would be to add a "stopping" 
stage between any two original stages, where the Maximizer will choose to 
either "stop" or "continue" the game. However, we would also like to force 
the Maximizer to "stop" at some stage. Unfortunately, it is impossible to do 
so and still remain within the realm of cumulative games, so we will have to 
deal with it a bit differently. 

Leavable games were introduced by Maitra and Sudderth (1992) as an 
extension to similar concepts in the theory of gambling. They proved that a 
leavable game with complete information and finite action sets has a value. 
We will prove that the same is true for leavable games with incomplete in- 
formation. 

Let G be a cumulative game with incomplete information. A stop rule for 
player % is a function s : H 1 — > N such that if s(h) = n and h! coincides with 
h in the first n coordinates, then s(h') = n. A leavable game with incomplete 
information L(G) is given by a cumulative game with incomplete information 
G but is play differently, as follows. Instead of playing in stages, both players 
choose their behavioral strategies simultaneously with the Maximizer also 
choosing a stop rule s. The game is played according to these strategies and 
the payoff is p(h r ) = Yli=o ^ f(h-n) where h 1 is the Maximizer's private infinite 
history. 

Theorem 4 A bounded leavable game with incomplete information and finite 
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signal sets has a value and that value is uniform. Furthermore, an optimal 
strategy exists for the Minimizer. 

Proof. The proof is essentially identical to the proof of Theorem 1. L n is 
Defined to be the game where the Maximizer is forced to choose a stop rule 
< n. L n is thus equivalent to G n in the proof of Theorem 1. 

The major point we should observe is that if A 1 and S* 1 are finite, any 
stop rule s : H 1 — > N is uniformly bounded: 3B\/h G H 1 s(h) < B. This 
implies that any pure strategy for the Maximizer in L actually belongs to 
some L n . Therefore, a strategy a 2 for the Minimizer with valL n (o- 2 ) < v for 
all n, has vali{a 2 ) < v. m 

5 Counterexamples 

The question arises whether positiveness is an essential or just a technical 
requirement. Both our proof and the alternative proof outlined need the 
positiveness in an essential way, but still is it possible that every cumulative 
game have a value? 

The answer is Negative. We shall provide a simple counterexample of a 
cumulative game (actually a stopping game, see Dynkin (1969)) with incom- 
plete information without a value. 

The game is as follows: at the outset of the game a bit (0 or 1) b is chosen 

randomly with some probability p > to be 1 and probability 1 — p to be 

0. the Maximizer is informed of the value of b but not the Minimizer. Then 

the following game is played. At each odd stage the Maximizer may opt to 

"stop" the game and the payoff is -1 if b = and 1 if b — 1. At each even 

stage the Minimizer may opt to "stop" the game and the payoff is -1 if b = 

and some A > - if b — 1. 
v 

The payoff before and after someone decides to "stop" the game is zero. 
This is a very simple stopping game with only one "unknown" parameter, 
yet, as we now argue, it has no value. 

Claim 5 The upper value of this game is p 
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Proof. To see that val(G) < p let the Minimizer's strategy be to continue at 
all stages. The Maximizer cannot gain more than pi + (1 — p)0 = p against 
this strategy, so the upper value cannot be higher than p. 

On the other hand, let a be a strategy for the Minimizer. It consists of 
{aj}^ the probabilities of stopping at stage i and = 1 — Yli=i <7 i the 
probability of never choosing "stop" . 

Fix e > and let N be an odd integer such that Yli=N+i a % < e - Let r be 
the following strategy for the Maximizer: if b = never stop, if b = 1 stop 
at stage N. The payoff under < a, r > is: 

N oo oo 

P J2°i A + p( Yl ^ + ^oc)i + (i-p)5^(-i) + (i-pVooO 

i=l i=N+l i=l 

oo N oo 

= P(5^ <7 »+ <T oo)+$^<7i(M-l)+ ^ a i(P- l )^P~ e 
i=l i=l i=N+l 

where the last inequality holds since pA — 1 > and X^at+i a % < e - 
Therefore val(G) > p. ■ 

Claim 6 The lower value of this game is p — . 

Proof. Let the Maximizer play the following strategy: If b = 1 stop at 
time 1 with probability 1 — i^-E and continue otherwise. If the Minimizer 
never decides to stop the payoff will be p(l — ^ £ )1 + (1 — p)0 = p — If 
the Minimizer decides to stop at any stage, the payoff will be p(l — + 
p^-A + (1 — p)(— 1) = p — Clearly any mix of these pure strategies will 
also result in payoff of exactly p — 

To see that the Maximizer cannot guarantee more assume to the contrary 
that there exist a strategy a for the Maximizer with val(a) > p — ^j 2 . This 
strategy consists of the probabilities {of of stopping at stage % if b = 0, 
and {a}}™! if b = 1. 

By our assumption, the payoff against any strategy for the Minimizer 
should be more than p — Let the Minimizer always choose to continue. 
The expected payoff in that case is 

oo oo .. 

p£>/)i + (i -p)($>?)(-i) >p-^r> 

i=l i=l 
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which implies 

oo 1 

tr a p 

Let N be sufficiently large such that X)i=i °^ > 1 — Consider the 

following strategy for the Minimizer: continue until stage N and then stop. 
The payoff will be 

N N 

pE^i+Ki-^M+a-px-i) 

i=i i=i 

N 

= P + p(l - J>-)(^ -!) + (! 

i=i 

1 — p 1 — p 

a contradiction. ■ 
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